Austin Acccidents

Data Analytics and Visualization Bootcamp

  • Bianca Hernandez
  • Neil Hsu
  • Ningning Mussomeli
  • Carlos Santillan

Dataset

Description This is a countrywide car accident dataset, which covers 49 states of the United States. The data is collected from February 2016 to December 2019, using several data providers, including two APIs that provide streaming traffic incident data. These APIs broadcast traffic data captured by a variety of entities, such as the US and state departments of transportation, law enforcement agencies, traffic cameras, and traffic sensors within the road-networks. Currently, there are about 3.0 million accident records in this dataset. Check here to learn more about this dataset.

Data Sources:

Moosavi, Sobhan, Mohammad Hossein Samavatian, Srinivasan Parthasarathy, and Rajiv Ramnath. “A Countrywide Traffic Accident Dataset.”, 2019.

Moosavi, Sobhan, Mohammad Hossein Samavatian, Srinivasan Parthasarathy, Radu Teodorescu, and Rajiv Ramnath. "Accident Risk Prediction based on Heterogeneous Sparse Data: New Dataset and Insights." In proceedings of the 27th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, ACM, 2019.

In [6]:
plot_map(tx_metro_cities_df,[30.2672,-97.7431])
Out[6]:

Number of traffic accidents in Austin

Analysis of traffic accidents in Austin for 2018 by day of the week

In [8]:
condition = (tx_metro_cities_df['Start_TimeYear'] == year) & (tx_metro_cities_df['City'] == city)
fig, ax = plt.subplots(figsize=(10,5))
tx_metro_cities_df['Start_TimeDayofweek'].value_counts().plot(kind='bar',  title=f'No of Accidents by Day {year} for {city}', ax=ax)
Out[8]:
<matplotlib.axes._subplots.AxesSubplot at 0x2549ec04860>
In [10]:
condition = (tx_metro_cities_df['Start_TimeYear'] == year) & (tx_metro_cities_df['City'] == city)& (tx_metro_cities_df['Start_TimeDayofweek'].isin(weekdays)) 
df = tx_metro_cities_df[condition]

fig, ((ax0),(ax1)) = plt.subplots(nrows=1,ncols=2,figsize=(10,8),sharey=True)  
margin_bottom = len(df[df['Start_TimeHour'].isin(morning_rush_hours)])
for num, val in enumerate(morning_rush_hours):
    condition =df['Start_TimeHour'] == val
    values = len(df[condition])
    margin_bottom -= values
    df[condition]['Start_TimeHour'] .value_counts().plot(kind='bar', ax=ax0,
                                     bottom = margin_bottom,color=colors[num],legend=False,title='Morning Rush  7:00 AM-9:59 AM')

margin_bottom= len(df[df['Start_TimeHour'].isin(evening_rush_hour)])
for num, val in enumerate(evening_rush_hour):
    condition =df['Start_TimeHour'] == val
    values = len(df[condition])
    margin_bottom -= values    
    df[condition]['Start_TimeHour'] .value_counts().plot(kind='bar', ax=ax1,
                                     bottom = margin_bottom,color=colors2[num],legend=False,title='Evening Rush  4:00 PM- 6:59 PM')

fig.suptitle(f'Comparsion between morning rush and evening rush for {city} in {year}')

plt.show()
In [ ]: